Learning Visual Representations using Images with Captions
ثبت نشده
چکیده
Current methods for learning visual categories work well when a large amount of labeled data is available, but can run into severe difficulties when the number of labeled examples is small. When labeled data is scarce it may be beneficial to use unlabeled data to learn an image representation that is low-dimensional, but nevertheless captures the information required to discriminate between image categories. This paper describes a method for learning representations from large quantities of unlabeled images which have associated captions; the aim is to learn a representation that aids learning in image classification problems. Experiments show that the method significantly outperforms a fully-supervised baseline model as well as a model that ignores the captions and learns a visual representation by performing PCA on the unlabeled images alone. Our current work concentrates on captions as the source of metadata, but more generally other types of meta-data could be used (e.g., video sequences with accompanying speech).
منابع مشابه
Text-Guided Attention Model for Image Captioning
Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer visual attention in the scene during our cognitive process. Inspired by this, we introduce a text-guided attention model for image captioning, which learns t...
متن کاملGenerating Diverse and Accurate Visual Captions by Comparative Adversarial Learning
We study how to generate captions that are not only accurate in describing an image but also discriminative across different images. The problem is both fundamental and interesting, as most machinegenerated captions, despite phenomenal research progresses in the past several years, are expressed in a very monotonic and featureless format. While such captions are normally accurate, they often la...
متن کاملLearning from Images with Captions Using the Maximum Margin Set Algorithm
A large amount of images with accompanying text captions are available on the Internet. These are valuable for training visual classifiers without any explicit manual intervention. In this paper, we present a general framework to address this problem. Under this new framework, each training image is represented as a bag of regions, associated with a set of candidate labeling vectors. Each label...
متن کاملUnifying Visual-Semantic Embeddings with Multimodal Neural Language Models
Inspired by recent advances in multimodal learning and machine translation, we introduce an encoder-decoder pipeline that learns (a): a multimodal joint embedding space with images and text and (b): a novel language model for decoding distributed representations from our space. Our pipeline effectively unifies joint image-text embedding models with multimodal neural language models. We introduc...
متن کاملThe effects of captioning texts and caption ordering on L2 listening comprehension and vocabulary learning
This study investigated the effects of captioned texts on second/foreign (L2) listening comprehension and vocabulary gains using a computer multimedia program. Additionally, it explored the caption ordering effect (i.e. captions displayed during the first or second listening), and the interaction of captioning order with the L2 proficiency level of language learners in listening comprehension a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006